Exploring idiomaticity with variant-based distributional measures and Shannon’s entropy
نویسندگان
چکیده
The goal of this research is to investigate whether we can take advantage of the syntactic and lexical fixedness of idiomatic expressions to devise corpus-based indices of idiomaticity and compositionality and whether these measures can actually predict human ratings of idiom syntactic flexibility. First of all we describe a method for automatically distinguishing potential idioms from only literal combinations via compositionality indices that leverage the greater lexical rigidity of idioms. Starting from two sets of idiomatic and literal Italian verbal constructions and adjective-noun pairs, we generated a series of lexical variants out of them, replacing their constituents with semantically related words. We then represented both the original targets and their variants as vectors in a distributional space and calculated cosine similarity between a given target and its variants, expecting idiomatic vectors to result less similar to the vectors of their variants with respect to the literal expression vectors. All in all, this proved to be the case, showing that focusing on the limited exchangeability of the constituents is an effective way to compute the idiomaticity degree of a given word combination. In the second part of our study, participants to a CrowdFlower questionnaire gave 1-7 acceptability scores to sentences containing Italian verbal idiomatic and literal combinations in different syntactic variants. We then modeled the human ratings with a hierarchical regression analysis via corpus-based measures computed for the same idioms. These included all the aforementioned compositionality indices and other formal flexibility measures which used Shannon's Entropy to calculate the idiom variability with regard to various parameters, such as the constituents morphology, the presence and type of determiners, etc. Promising results in this regression analysis support the cognitive plausibility of our computational indices to explain the way speakers process idioms.
منابع مشابه
Lexical Variability and Compositionality: Investigating Idiomaticity with Distributional Semantic Models
In this work we carried out an idiom type identification task on a set of 90 Italian V-NP and V-PP constructions comprising both idioms and non-idioms. Lexical variants were generated from these expressions by replacing their components with semantically related words extracted distributionally and from the Italian section of MultiWordNet. Idiomatic phrases turned out to be less similar to thei...
متن کاملPermutation Entropy for Random Binary Sequences
In this paper, we generalize the permutation entropy (PE) measure to binary sequences, which is based on Shannon’s entropy, and theoretically analyze this measure for random binary sequences. We deduce the theoretical value of PE for random binary sequences, which can be used to measure the randomness of binary sequences. We also reveal the relationship between this PE measure with other random...
متن کاملA new hybrid method based on fuzzy Shannon’s Entropy and fuzzy COPRAS for CRM performance evaluation (Case: Mellat Bank)
Customer relationship management is a multiple perspective business paradigm which helps companies gaining competitive advantage through relationships with their customers. An integrated framework for evaluating CRM performance is an important issue which is not addressed completely in previous studies. The main purpose and the most important contribution of this study is introducing a framewor...
متن کاملOn Nonlinear Complexity and Shannon's Entropy of Finite Length Random Sequences
Pseudorandom binary sequences have important uses in many fields, such as spread spectrum communications, statistical sampling and cryptography. There are two kinds of method in evaluating the properties of sequences, one is based on the probability measure, and the other is based on the deterministic complexity measures. However, the relationship between these two methods still remains an inte...
متن کاملSurvey and comparative analysis of entropy and relative entropy thresholding techniques
Entropy-based image thresholding has received considerable interest in recent years. Two types of entropy are generally used as thresholding criteria: Shannon’s entropy and relative entropy, also known as Kullback–Leibler information distance, where the former measures uncertainty in an information source with an optimal threshold obtained by maximising Shannon’s entropy, whereas the latter mea...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016